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We investigate the performance of a variant of Axelrod's model for dissemination of culture - the 
Adaptive Culture Heuristic (ACH) - on solving an NP-Complete optimization problem, namely, the 
classification of binary input patterns of size F by a Boolean Binary Perceptron. In this heuristic, 
iV agents, characterized by binary strings of length F which represent possible solutions to the 
optimization problem, are fixed at the sites of a square lattice and interact with their nearest 
neighbors only. The interactions are such that the agents' strings (or cultures) become more similar 
to the low-cost strings of their neighbors resulting in the dissemination of these strings across the 
lattice. Eventually the dynamics freezes into a homogeneous absorbing configuration in which all 
agents exhibit identical solutions to the optimization problem. We find through extensive simulations 
that the probability of finding the optimal solution is a function of the reduced variable F/N 1 ^ 
so that the number of agents must increase with the fourth power of the problem size, N oc F 4 , 
to guarantee a fixed probability of success. In this case, we find that the relaxation time to reach 
an absorbing configuration scales with F 6 which can be interpreted as the overall computational 
cost of the ACH to find an optimal set of weights for a Boolean Binary Perceptron, given a fixed 
probability of success. 

PACS numbers: 87.23. Ge 89.75.Da, 89.70.Eg, 05.50.+q 



I. INTRODUCTION 

In the early eighties, the perception that the dynamics 
of the celebrated Hopfield model of associative memory 
PQ was solving an optimization problem, namely, that of 
finding which stored pattern is closest to the input config- 
uration, led to the proposal of a powerful general-purpose 
optimization heuristic, the so-called Hopfield- Tank neu- 
ral network [2] . A similar situation happened in the late 
nineties, when Kennedy [3] pointed out that Axelrod's 
model of culture dissemination [J] could work as a collec- 
tive problem-solving system provided that one associates 
the cultures of the agents (represented by strings of in- 
teger numbers) with the trial solutions of a given opti- 
mization problem. That proof-of-concept paper demon- 
strated then that social interaction is a natural compu- 
tation method. 

In contrast with Hopfield- Tank neural network, the op- 
timization heuristic based on social interaction, which 
henceforth we refer to as the Adaptive Culture Heuris- 
tic (ACH), has not enjoyed great popularity among the 
physics and computer science community, perhaps be- 
cause of the appearance at the same time of a related 
algorithm, called particle swarm optimization, which has 
by now become an established optimization paradigm 
[5JI5]. Particle swarm optimization, however, suits best to 
search in space of real-valued variables, whereas ACH is 
proper to explore configuration spaces of discrete- valued 
variables, which is the case of most combinatorial opti- 
mization problems that have attracted the attention of 
the statistical physics community 7J . Here we attempt to 
change this situation by showing that the performance of 
the ACH seems to scale very favorably (it improves expo- 
nentially fast) with the number of agents in the system. 

Following Axelrod's model [3] , the ACH requires a pop- 



ulation oi N ~ L 2 agents placed at the sites of a square 
lattice of size L x L with periodic boundary conditions. 
The agents can interact with their four nearest neighbors 
only. Each agent is characterized by a binary string of 
length F, which represents the agent's solution to the 
optimization problem in the ACH interpretation. In Ax- 
elrod's model this string, which is not necessarily binary, 
represents the culture of the agent. The interaction be- 
tween any two neighboring agents occurs whenever the 
agents have different strings, regardless of their associ- 
ated cost, and it is such that the string of the agent with 
the higher cost solution is slightly modified to become 
more similar to that of the more efficient partner. 

We recall that in Axelrod's model the interaction be- 
tween two neighboring agents takes place with probabil- 
ity proportional to the number of entries their cultural 
strings have in common and so agents with completely 
different cultures do not interact. In the case the agents 
are allowed to interact, the interaction results in the in- 
crease of the similarity between the cultures of the two 
agents, as in the ACH update rule. The fact that some 
agents are prohibited to interact is the key ingredient for 
the existence of stable globally polarized states (i.e., cul- 
turally heterogeneous absorbing configurations) which is 
the major outcome of Axelrod's model [3]- In the ACH, 
however, we seek homogeneous absorbing configurations 
associated to low cost solutions of the target optimiza- 
tion problem and so the homogenizing interactions are 
always allowed regardless of the similarity between the 
strings of the neighboring agents [5]. 

In order to obtain statistically reliable results on the 
scaling of the performance of ACH with the size F of 
the optimization problem and the number N of agents in 
the lattice, we focus on a specific optimization problem 
which involves the manipulation of binary variables only, 
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namely, the categorization of binary input patterns by 
the Boolean Binary Perceptron. This is a NP-Complete 
problem [5] for which there is no efficient specific heuris- 
tic optimization method available [9 and whose random 
version has received a considerable attention from the 
statistical mechanics community (see, e.g., [T0HT5] ) be- 
cause its phase diagram exhibits a frozen phase similar 
to that of the Random Energy Model [IB] • 

The main result of this paper is that, given a fixed 
probability of success, the overall computational cost 
of ACH to find a minimum-cost solution for the learn- 
ing problem in a Boolean Binary Perceptron scales with 
the sixth power of the size of the input string. Of 
course, this finding has no implication on the celebrated 
NP P conjecture of computer science since the F 6 
scaling holds for typical realizations of the input-output 
mapping, rather than for all realizations as would be re- 
quired to disprove that assertion. In addition, ACH is not 
a deterministic algorithm which disqualifies the heuristic 
as a candidate to disprove the NP ^ P conjecture. 

The rest of this paper is organized as follows. First 
we introduce the target optimization problem - catego- 
rization of binary patterns by the Boolean Binary Per- 
ceptron - on which we will measure the performance of 
the Adaptive Culture Heuristic (Sect.|H|. This heuristic 
is then described in great detail in Sect. |III| and the 
results of its performance on the training task, measured 
by the probability that the heuristic finds a minimum 
cost solution, are presented in Sect. |IV| In this section 
we present also the performance of the ACH in the case 
the agents are placed at the nodes of random symmet- 
ric graphs and argue that the square lattice connectivity 
C — 4 yields the best performance. Finally, in Sect. |V| 
we present our concluding remarks. 

II. THE BOOLEAN BINARY PERCEPTRON 

The Boolean Binary Perceptron is a single-layer neural 
network whose weights are constrained to take on binary 
values only. More explicitly, the network consists of an 
input layer with F binary neurons = ±1, k = 1, . . . , F 
with each input neuron connected to the output unit o = 
±1 through the weights Wk = ±l,fc = 1,...,F. The 
state of the output unit is given by the equation 

o = sign ^Y^WkSk^j (1) 

where sign (x) = 1 for x > and —1 otherwise. We 
will restrict F to take on odd integer values only, so we 
can guarantee that the argument of the sign function 
will never vanish. The learning task is to find a set of 
weights Wk = ±l,fc = 1, . . . , F that emulates the input- 
output mapping (s' x , . . . , s l F ) — > t l for I = 1, . . . , M. If 
the weights were allowed to assume real values then this 
learning task could easily be accomplished by the per- 
ceptron learning algorithm or by the Widrow-Hoff rule 



[17] . However, when the binary constraint is taken into 
account the learning task becomes an NP-complete prob- 
lem since it is equivalent to integer programming [5] . As- 
suming that NP ^ P, this means that no deterministic 
algorithm can find Wk, k = 1, . . . , F (if it exists) for any 
realization of the input-output mapping in a time that 
grows polynomially with the parameter F. 

Here we focus on random versions of the input-output 
mapping where the input entries s l k are statistically inde- 
pendent random variables chosen as ±1 with equal prob- 
ability. As for the output t we consider two schemes. In 
the first scheme, we choose t l = ±1 at random with equal 
probability - so-called random output mapping. In this 
case, it is not possible to guarantee that there is a set of 
binary weights that emulates the input-output mapping 
perfectly. In fact, statistical mechanics studies based on 
the landmarking paper by Gardner [18] , show that in the 
limit F —¥ oo there are optimal sets of weights provided 
that the ratio a = M/F is less than a r c « 0.83 [TT1IT5] . 
So, in this limit, we say that the input-output mapping 
is linearly Boolean separable for a < a r c . 

However, it is convenient to consider input-output 
mappings which are linearly Boolean separable for any 
choice of the parameters F and M . This observation 
motivates the second scheme to set the values of the out- 
puts t l , which are given by 

t<=sign(j>°4) (2) 

for I = 1, . . . , M. Here = 1, . . . , F are statistically 

independent random variables that take on the values 
±1 with equal probability. Clearly, such input-output 
mapping is linearly Boolean separable by construction, 
since the set of binary weights w^, k = 1, . . . , F emulates 
it perfectly. The solution weight space of this problem 
was studied numerically [TU] H2] and analytically [H] , 
resulting in the conclusion that in the limit F — > oo the 
only solution to the mapping is the teacher perceptron 
wl,k= l,...,Ffor a> a° c 1.245. 

From the perspective of interpreting the neural net- 
work training as an optimization problem we define the 
following cost function 

M/F \ 

£(W) = E e -t'E^ s i ( 3 ) 

;=i V k=i ) 

where O (x) = 1 if x > and otherwise. Hence the 
cost E yields the number of misclassified inputs and so 
its minimum (optimum) value is zero in the case of a 
linearly Boolean separable mapping. 

In this paper we will concentrate mostly on the linearly 
Boolean separable mappings defined by Eq. |2| because 
in this case the optimal solution is known a priori so we 
can evaluate the performance of the ACH for relatively 
large problems (F < 200), whereas in the random out- 
put mapping we are restricted to the range F < 25, since 
we need to carry out an exhaustive search over the 2 F 



3 



possible weight configurations in order to find the mini- 
mum cost solution. However, our findings indicate that, 
regarding the scaling with respect to the relevant param- 
eters of the problem, the performance of the heuristic is 
essentially the same regardless of whether the mapping 
is linearly Boolean separable or not. 



III. THE ADAPTIVE CULTURE HEURISTIC 

The set of weights of a Boolean Binary Perceptron is 
completely specified by a binary string of length F. In 
the adaptive culture heuristic, each such string is inter- 
preted as the culture of an agent and its cost, given by 
Eq. ^, measures the unworthiness of the culture. The 
idea behind the ACH is that the agents should prefer to 
adopt more valuable cultures, i.e., those cultures associ- 
ated with low cost values [3]. In this context, it is more 
convenient to refer to the strings that characterize the 
agents as solutions rather than cultures. 

As already pointed out, the agents are fixed at the sites 
of a square lattice of size L x L with periodic boundary 
conditions and can interact with their four nearest neigh- 
bors only. At each time we pick an agent at random 
(this is the target agent) as well as one of its four neigh- 
bors. These two agents will interact provided that the 
cost Q of the solution associated to the target agent is 
greater or equal to the cost of the solution associated to 
the randomly selected neighbor. An interaction consists 
of selecting at random and then flipping one of the en- 
tries which distinguish the target agent from its neighbor. 
Note that only the string of the target agent is updated, 
i.e., the agent with the higher cost solution is changed to 
become more similar to its neighbor. This change may 
actually increase the cost of the solution of the target 
agent, due to the highly nonlinear dependence of the cost 
([3]) on the individual entries of the binary string. This 
procedure is repeated until the dynamics freezes in a ho- 
mogeneous absorbing configuration. We can guarantee 
that the frozen configurations are homogeneous because 
we allow interactions, and so changes in the target agent, 
even when the two interacting agents have the same cost 
value. 

Because of the need to re-calculate the cost function 
after each interaction, the implementation of the ACH 
to search for near optimal weights of the Boolean Binary 
Perceptron is a very computationally demanding problem 
and so an extensive statistical analysis of the performance 
of this heuristic requires a highly optimized code. In par- 
ticular, to simulate efficiently the ACH for large lattices 
we use a procedure based on the concept of active agents 
(see [TH EQ]). An active agent is an agent whose solution 
differs from the solution of at least one of its four neigh- 
bors. Clearly, only active agents can change their strings 
and so it is more efficient to select the target agent ran- 
domly from the list of active agents rather than from the 
entire lattice. In the case that the solution string of the 
target agent is modified by the updating rule, we need to 



re-examine the active/inactive status of the target agent 
as well as of all its neighbors so as to update the list of 
active agents. The dynamics is frozen when the list of 
active agents is empty. Note that the cost of the solution 
string plays no role in the definition of active agents. 



IV. RESULTS 

All our results are obtained for M = 2F so that for the 
linearly Boolean separable case the teacher set of weights 
u>° is the only global minimum (zero-cost) solution of the 
cost function (J3J) , provided that F is sufficiently large. 
However, what is crucial for our purposes is the knowl- 
edge that for any value of F there is at least one solution 
for which the cost is zero, so that we can focus on the 
number of runs of the ACH which results in this minimal 
cost, regardless of whether the actual solution found by 
the heuristic is the teacher solution or another degenerate 
zero-cost solution. In particular, for each realization of 
the input-output mapping we run the ACH for 100 ran- 
dom initial settings of the agents' solutions and calculate 
the fraction of runs for which the heuristic reaches a min- 
imum cost solution. This fraction is then averaged over a 
variable number, ranging from 500 to 10 6 , of realizations 
of the input-output mapping. 

As pointed out before, most of our results are for the 
linearly Boolean separable case since in this case we know 
by construction the cost of the optimum solution and so 
we can study the performance of the heuristic for large 
values of F. At the end of this section we present some 
results for the random Boolean mapping in the region 
F < 25 since then we first need to perform an exhaustive 
search in the solution space to find the minimum cost. 
The main quantity we focus here is the mean fraction of 
runs for which the heuristic reached the minimum-cost 
solution, which can be interpreted as the probability P m 
that a run of the ACH finds the optimum cost. This 
quantity is shown in Fig. [T]for the linearly Boolean sep- 
arable case as function of the size F of the problem and 
of the number N of agents in the system. 

Figure [T] reveals a most surprising aspect about the 
performance of the ACH, namely, that for small N, say 
N = 5 2 , a fourfold increment on the number of agents 
in the system, increases the probability of finding an op- 
timal solution by several orders of magnitude. Actually, 
this observation holds true even for large N, provided 
that F is large enough. To quantify this observation, in 
Fig. [2] we show how P m approaches 1 as the number of 
agents N increases for two values of the input size F. 
This analysis shows that for N > 30 2 , the probability 
1 — P rn that the heuristic fails to find the optimum cost 
vanishes like exp (— cifN 1 /^ where the (fitting) param- 
eter af is inversely proportional to F. 

These findings prompt us to redraw Fig. [T] in terms 
of the rescaled variable u = F/N 1 ^ 4 , which is done in 
Fig. H] The collapse of the data for N > 30 2 into a 
single curve implies that P m = g (u) . We note that the 
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FIG. 1. The probability that a run of the ACH finds a 
zero-cost solution for linearly Boolean separable mappings as 
function of the input size F for lattices with (left to right) 
N = 5 2 , 10 2 , 20 2 , 30 2 , 40 2 , 50 2 and 60 2 agents. The error bars 
are smaller than the sizes of the symbols and the lines are 
guides to eye. 
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FIG. 2. Semi-logarithmic plot of the probability 1 — P m that 
a run of the ACH does not find a zero-cost solution for lin- 
early Boolean separable mappings as function of TV 1 / 4 for 
F = 91 (O) and F = 41 (A). The dashed straight lines are 

the fittings 1 — P m = fcp exp ( — af-iV 1 / 4 



FIG. 3. The same data exhibited in Fig. [T] plotted in terms of 
the rescaled variable u == F/N 1/4 . The data for TV > 30 2 lie 
in approximately the same curve given by the scaling function 
Pm =g(u). 
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FIG. 4. Semi-logarithmic plot of the probability that a run of 
the ACH finds a zero-cost solution for linearly Boolean sepa- 
rable mappings as function of the input size F for N = 5 2 (+) 
and N = 10 2 (x). The error bars are smaller than the sizes 
of the symbols. The solid straight line yields the probabil- 
ity that the optimal solution is chosen in a random selec- 
tion, 2 _F , whereas the dashed straight lines are the fittings 
Pm = bjv exp (—a N F). 



failure of the scaling function g (u) to describe the data 
for N < 30 2 was already expected from the results of Fig. 
[2j In fact, those results show that in the limit mOwe 
have g (it) ~ exp (—a/u) with a ps 0.5. 

The study of the scaling function g (u) in the other 
extreme limit, u — > oo, requires very large input sizes 
(F > 200) for relatively large lattices (N > 30 2 ) which is 
computationally unfeasible because of the need to use a 
huge number of samples to get a reliable statistics since 
P m — >• in this limit. Nevertheless, in Fig. [4] we present 



such analysis in the case of small lattices N = 5 2 and 
N = 10 2 , for which we know the scaling behavior is not 
valid. As expected, the results show that P m vanishes 
exponentially with increasing F, i.e., P m ~ exp (— ajv-F 1 ). 
Here the fitting parameter is given by ~ l/N 1 / 2 , indi- 
cating that for small N the gain on performance obtained 
by increasing the number of agents is much larger than 
the gain in the scaling regime where a at ~ 1/N 1 ^ 4 . In 
addition, Fig. [4] is useful to highlight the enormous gain 
on performance resulting from the increase of the number 
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FIG. 5. Scaled relaxation time of the ACH as function of the 
input size F for N = 30 2 (O), 40 2 (A) and 50 2 (v)- The error 
bars are smaller than the sizes of the symbols. The dashed 
curve is the fitting T/N = 0.12F 2 . 



FIG. 6. Comparison between the performances of the ACH for 
the random mapping (open symbols) and the linearly Boolean 
separable mapping (filled symbols) for N = 5 2 (O)il0 2 (A) 
and 20 2 (v)- The error bars are smaller than the sizes of the 
symbols and the lines are guides to the eye. 



of agents involved in the optimization procedure. 

A most appealing feature of the ACH is that the dy- 
namics always freezes in a homogeneous absorbing con- 
figuration and so the algorithm halts. We must note, 
however, that the ACH is a stochastic heuristic since the 
same initial configuration of the lattice can lead to differ- 
ent absorbing configurations depending on the sequence 
of site updates. The fact that the dynamics eventually 
freezes allows us to define a relaxation time for the ACH, 
which is a quite unexpected bonus for a stochastic heuris- 
tic. Accordingly, in Fig. [5] we show the scaled average re- 
laxation time T/N as function of the input size F. The 
unsurprising fact that T scales linearly with the number 
of agents N is manifested by the coincidence of results 
for different lattice sizes. The instructive result here is 
that T grows with the square of the input size only This 
result will be useful for the evaluation of the overall com- 
putational demand of the ACH (see Sect. \V\ . 

The effect of the use of linearly Boolean separable 
input-output mappings on the measured performance of 
ACH can be appreciated in Figure [6] where we show a 
comparison between the performance of that heuristic 
for the random and the linearly Boolean separable map- 
ping. As mentioned before, in the case of the random 
mapping the minimum cost is not necessarily zero and 
the global minimum is obtained through an exhaustive 
search in the configuration space (hence the restriction 
to F < 25). Although the random mapping seems to 
be a harder problem to the ACH, there is no qualitative 
difference between the dependence of our performance 
measure P rn on the parameters N and F for the two 
mappings, and so our scaling results are likely to remain 
true for the random mapping as well. 

To conclude our analysis, a word is in order about the 
impact of the connectivity between the agents on the per- 



formance of the ACH. It is well-known that the expansion 
of the influence range of the agents, modeled by increas- 
ing the connectivity of the lattice [3TJ [55] or by placing 
the agents in more complex networks [53] (e.g., small- 
world and scale- free networks), results in the cultural 
homogenization of the population in Axelrod's model. 
Hence, it is not unreasonable to expect that by increas- 
ing the connectivity of the lattice (or network) the relax- 
ation time would decrease and so the computational cost 
of the heuristic would be reduced. Alas, that is not so. 
In fact, the results of Fig. [7] which shows the scaled re- 
laxation time T/N as function of the connectivity C of a 
random symmetric network composed of N = 10 2 agents, 
indicate that T/N reaches a minimum around C = 4. As 
expected, we find that the probability P m of reaching the 
optimal solution is not affected by the choice of the con- 
nectivity C, and so the connectivity C = 4 yields the best 
performance, in the sense of the least computational cost, 
of the ACH for not too small F. In addition, the finding 
that the results of the random symmetric network with 
C = 4 are indistinguishable from the results obtained 
for the regular square lattice (data not shown) suggests 
that the topology of the network does not influence the 
performance of the ACH. 



V. CONCLUSION 

Understanding and quantifying how cooperation can 
improve the performance of groups of individuals to 
solve problems is an issue of great interest to many ar- 
eas - ranging from computer science to business admin- 
istration [24] . Our findings about the performance of 
the Adaptive Culture Heuristic (ACH) indicate that the 
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FIG. 7. Scaled relaxation time of the ACH as function of the 
connectivity C of random symmetric networks of N = 10 2 
agents for F = 11(D), 21 (v), 31(A) and 41(0)- Each symbol 
represents the average over 10 3 distinct random symmetric 
networks of fixed connectivity. The error bars are smaller 
than the sizes of the symbols and the lines are guides to the 
eye. 

number of agents participating of the collective solution 
of an optimization problem may influence the outcome 
of the process in a highly non-linear way (see, e.g., Fig. 

Our results were derived for a particular NP-Complete 
optimization problem, namely, the classification of lin- 
early Boolean separable input patterns by a Boolean Bi- 
nary Perceptron, whose optimal (zero-cost) solution is 
known by construction and which involves the manipula- 
tion of binary variables only. These two features allowed 
the study of the performance of the ACH for very large 
input sizes F - which essentially measures the 'size' of 



the optimization problem - and for a large number N of 
agents involved in the collective problem solving task. 

We focused on a single performance measure P m , which 
yields the probability that a run of the ACH finds an op- 
timal solution, and found that it is a function of the re- 
duced variable u — F/N 1 ^ for N > 30 (see Figs. [5] and 
[3]). This is a most remarkable and useful result which 
informs how the number of agents must scale with the 
problem size for a given fixed performance of the ACH, 
namely, N oc F A . Recalling that the scaled relaxation 
time T/N scales with F 2 (see Fig. [5| we find that the 
overall computational cost to find an optimal solution 
with a fixed probability scales with F 6 . As mentioned 
in Sect. [TJ this finding has no bearing on the NP 7^ P 
conjecture of computer science. In addition, a surprising 
result, which is summarized in Fig. [7J indicates that the 
implementation of the ACH on a square lattice or on a 
random symmetric network of connectivity C — 4, yields 
the best performance when compared with the implemen- 
tation on a random network of different connectivity. 

It would be most interesting to find out whether the 
F 6 scaling law derived for the problem of learning lin- 
early separable patterns by a Boolean Binary Perceptron 
holds for other optimization problems as well. In that 
case, one would have revealed a genuine property of the 
ACH which, given the minimal nature of the underlying 
social interaction mechanism, might serve as a bound to 
the performance of heuristics based on collective compu- 
tation. 
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